Semantify CEUR-WS Proceedings: Towards the Automatic Generation of Highly Descriptive Scholarly Publishing Linked Datasets

نویسندگان

  • Francesco Ronzano
  • Gerard Casamayor
  • Horacio Saggion
چکیده

Rich and fine-grained semantic information describing varied aspects of scientific productions is essential to support their diffusion as well as to properly assess the quality of their output. To foster this trend, in the context of the ESWC2014 Semantic Publishing Challenge, we present a system that automatically generates rich RDF datasets from CEUR-WS workshop proceedings. Proceedings are analyzed through a sequence of processing phases. SVM classifiers complemented by heuristics are used to annotate missing CEUR-WS markups. Annotations are then linked to external datasets like DBpedia and Bibsonomy. Finally, the data is modeled and published as an RDF graph. Our system is provided as an on-line Web service to support on-the-fly RDF generation. In this paper we describe the system and present its evaluation following the procedure set by the organizers of the challenge.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Webbing Semantified Scholarly Communication Datasets for Improved Resource Discovery

The success of Linked Data project has played a vital role in the realization of the Semantic Web on a global stage. It has motivated people to publish datasets which are important for information linking, resource discovery and can further make contributions in shaping the Web as a single connected data space. This effort has successfully amassed a variety of Linked Data and has introduced man...

متن کامل

Towards a Linked Open Dataset for Scholarly Publishing: Semantic Lancet Project

There is an ever increasing interest in publishing Linked Open Datasets about scientific papers. The current landscape is very fragmented: some projects focus on bibliographic data, others on authorship data, others on citations, and so on. The quality is also heterogeneous and the production and maintenance of such datasets is difficult and time-consuming. In this paper we introduce the Semant...

متن کامل

Semantic Publishing Challenge - Assessing the Quality of Scientific Output

Linked Open Datasets about scholarly publications enable the development and integration of sophisticated end-user services; however, richer datasets are still needed. The first goal of this Challenge was to investigate novel approaches to obtain such semantic data. In particular, we were seeking methods and tools to extract information from scholarly publications, to publish it as LOD, and to ...

متن کامل

Automatic Construction of a Semantic Knowledge Base from CEUR Workshop Proceedings

We present an automatic workflow that performs text segmentation and entity extraction from scientific literature to primarily address Task 2 of the Semantic Publishing Challenge 2015. The goal of Task 2 is to extract various information from full-text papers to represent the context in which a document is written, such as the affiliation of its authors and the corresponding funding bodies. Our...

متن کامل

Unstable markup: A template-based information extraction from web sites with unstable markup

This paper presents results of a work on crawling CEUR Workshop proceedings web site to a Linked Open Data (LOD) dataset in the framework of Semantic Publishing Challenge 2014. Our approach is based on so-called “templates of web site’ blocks“ and DBpedia for crawling and linking extracted entities.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014